ISCAS-CMICH-Shi-MC2

VAST 2012 Challenge
Mini-Challenge 2:

 

 

Team Members:

 

Lei Shi, Institute of Software, Chinese Academy of Sciences, shijim@gmail.com PRIMARY
Qi Liao, Central Michigan University,
qi.liao@cmich.edu

Chunxin Yang, Northwestern Polytechnical University, chunxin11@163.com

Student Team: No

 

Tool(s):

 

The tool is called Network Security and Anomaly Visualization (NSAV). NSAV was derived from ENAVis (Enterprise Network Activities Visualization) which was developed during Qi Liao’s Ph.D. study at the University of Notre Dame. In the graph visualization part, NSAV is based on the toolkit developed by Lei Shi@ISCAS.

 

Video:

 

 

VAST2012-MC2-Lei

 

 

Answers to Mini-Challenge 2 Questions:

 

MC 2.1 Using your visual analytics tools, can you identify what noteworthy events took place for the time period covered in the firewall and IDS logs? Provide screen shots of your visual analytics tools that highlight the five most noteworthy events of security concern, along with explanations of each event.

The firewall/IDS logs in the regional Bank of Money (BoM) network are processed into traffic flow graphs and per-host anomaly list (See Data Processing part for details), and then visualized in NSAV tool (See Visualization part for details).

All the 40-hour data are loaded into our tool to obtain an abstracted overview of network traffic (see graph abstraction part for details) within BoM during the inspected time period, as shown in Figure 1(a). We then manually drag-and-drop the machine groups into larger groups by machine types, and generate Figure 1(b), indicating three major traffic types in BoM: “Workstation Group I <--> Headquarter DCs & Website Group I”, “Workstation Group II & External DNS <--> Regional Domain/DNS server”, “Workstation Group III <--> Website Group II”.

image002.png

(a) (b)

Figure 1. Overview pictures of the traffic flow graphs in BoM office: (a) after the abstraction. (b) after a further manual grouping.

We then combine the anomaly lists (see Figure 12) detected in the data pre-processing section to the graph links. Result is shown in Figure 2.

Figure 2. NSAV tool visualizing both the traffic graph abstraction and the anomalies on the graph links (flows). The grouped node (10.32.5.58+) indicating two similar machines (10.32.5.58, 10.32.5.59) is selected in the graph. Their temporal anomaly distributions are plotted in the bottom-right panel. Details about each single anomaly is shown as tooltip upon a mouse hovering.

We identify the noteworthy events from the anomaly graph in a divide-and-conquer method over each of the isolated subgraphs (connected components).

1.              IRC Malware Infection: from 172.23.231-240.*, 123-136.*, 0.*, 1.*, 252.*, 254.* (totally 580 workstations) to 10.32.5.50-59, 68-69 (12 websites)

In the first subgraph of the traffic network, as in Figure 3, it is identified that the “I” and “M” icons appeared frequently and almost in couples in reverse directions. A selection of the “IRC-Malware-Infection_s” anomaly (icon “I”) in the anomaly type list reveals three group of machines, highlighted in red in the graph. Are all workstations attacking and sending malwares to a portion of the 12 websites (10.32.5.*), highlighted in blue in the graph. Further selecting two typical attackers (172.23.123.105, 172.23.231.174) and websites (10.32.5.50, 10.32.5.52) in the graph filter panel, the temporal anomaly distribution of these four machines are plotted in the temporal anomaly panel. It is shown that the attacks to the websites overwhelm the whole inspected time period. Note that a few websites (10.32.5.50, 68, 69) do not reply with the IRC authorization message (icon “M”), but all the other websites (10.32.5.51-59) do reply, indicating that they are running IRC services and vulnerable to the IRC attacks.

Figure 3. Three group of machines initiating IRC Malware Infections to the websites through port 6667.

A detailed examination on host 172.23.231.174 and 172.23.231.175 (two all-time attackers) show fine-grained patterns (Figure 4): the attacks are composed of two stages, indicated by a small gap in the middle of 172.23.231.174’s temporal panel. A drill-down analysis on 172.23.231.175 at this gap shows that the first stage ends-up with a very large port (43325) and the second stage starts with a relatively small port (1185). After checking the anomaly file of 172.23.231.175 (top-left of Figure 4), we deduce that the attacks from the workstations are probably programmed, with sequentially enumerated source ports from the compromised systems. It can be classified as DoS attacks to exhaust the websites’ processing and networking bandwidth.

Figure 4. Detailed inspection of 172.23.231.174, 172.23.231.175 on their temporal anomaly distributions. Two stage of source-port-sweep IRC attacks are identified with programmed behaviors.

 

2.              Failed FTP/SSH connection: from 172.23.231-240.*, 123-136.*, 0.*, 1.*, 252.*, 254.* (totally 580 workstations) to 10.32.5.50-59, 68-69 (12 websites)

In the same subgraph, we also found anomalies on the workstations indicating FTP/SSH connections to the websites (Figure 5). The connection attempts concentrate on 10.32.5.50-57, indicated by the grey version of icon “C” and “S”. We split the potential source into sub-groups by anomaly types, and select the destination machine group of 10.32.5.50-57. The temporal anomaly panel demonstrates that the first stage of the FTP/SSH connection lies mostly in the first 6 hours, either in parallel or preceding to the IRC attacks. In the second stage, synchronized to the second stage of the IRC attacks, only FTP connections are tried. In both stages, no reply from the websites is recorded. This behavior may suggest that the compromised systems (workstations) probe FTP/SSH services at the websites, probably for succeeding DoS attacks. However, no further events happen in this thread, since no FTP/SSH services are hosted in the websites.

Figure 5. FTP/SSH connection attempts to websites 10.32.5.50-57. The related workstation machines are grouped both by the neighbor set and the node anomaly types.

 

3.              Failed database/remote desktop/mail services connection: from 172.23.231-240.* (5 workstations) to 172.23.0.1 (external regional network interface)

In the same subgraph, another group of 5 workstation machines have more than FTP/SSH connections, identified by “A”, “T” and “M” icons. We manually group them together and check their temporal anomalies. The extra connections mostly happen in the starting period of the inspected time. Details of the anomaly description indicate that the connection attempts are potential scans over database (PostgreSQL/Oracle/MySQL), remote desktop (VNC), mail (Pop3, IMAP) and other (SNMP) services. The destination IP, 172.23.0.1, is the external interface at the firewall going out of the regional network. We cannot know which hosts outside the regional network are scanned. However, we know that none of these connections succeed, because no reverse traffic is detected.

Figure 6. Database/remote desktop/mail service connection attempts to 172.23.0.1. The related 5 workstation machines are grouped together due to the same set of anomaly types.

 

4.              Potential DNS hijacking/spoofing and vulnerability exploits over the DNS server: from 172.23.0.*, 1.*, 5.* (89 workstations) to 172.23.0.10 (Domain/DNS controller)

Another connected component of the traffic graph, as shown in Figure 7, is centric to the Domain/DNS server machine 172.23.0.10. Totally 89 workstations send suspicious traffic to the server, indicated by two type of anomalies. The temporal anomaly panel of Figure 7 shows the details on the server (172.23.0.10) and two typical workstation machines (172.23.1.104, 172.23.1.105). The “P” icon indicates DNS updates to the server, suspected to be DNS hijacking/spoofing attacks, because the workstations should not have responsibility to update the DNS table. The “G” icon indicates Generic-Protocol-Command-Decode. It has two sub-classes by description, the asn1 buffer overflow attempt and the IPC$ share access attempts. Both are exploiting the system vulnerability and resource of the server. A drill-down analysis on 172.23.1.105 with an enlarged window shows that the vulnerability/resource exploits have regular patterns, one per 15 minutes from each workstation at sequentially enumerated ports, highly suspected to be programmed attacks from the compromised systems.

Figure 7. The traffic flow graph centric to the regional domain/DNS server(172.23.0.10). Potential DNS hijacking/spoofing events and vulnerability exploits are detected from a group of 89 workstations to the server.

 

5.              Unknown hosts: 172.28.29.* (30 workstations)

In the last connected component, the traffic are mostly web page retrievals from 2700+ workstations at 16 websites, as well as the financial transactions and web mail access with the headquarter data center. 30 unknown hosts (172.28.29.*) are identified, in a different Class B network with the regional BoM machines. These suspicious hosts have two-way connections (attempts) with both the websites and headquarter data centers. Note that the inbound traffic from the websites and headquarter data centers to these hosts are all denied at the firewall. However, this is significantly different from the other workstations where no inbound traffic or connection attempt is made, highly suspected to be source IP spoofing event.

Figure 8. Traffic graph for website visit and headquarter data center access. 30 unknown workstation hosts are identified.

 

MC 2.2 What security trend is apparent in the firewall and IDS logs over the course of the two days included here? Illustrate the identified trend with an informative and innovative visualization.

On the IRC malware infection events given in Figure 3 and 4, at the end of the inspected time period, the second cycle of the attacks are about to finish. However, it is believed that a third and more cycles of the same type of attacks will be repeated unless measures are taken to mitigate the event. Moreover, comparing the traffic graph of the first 10 hours and the last 10 hours, as shown in Figure 9, it is straightforward to find that the first period has less workstation machines (4+253+57 = 314) involved in the suspicious attacks, while the second period has more probably compromised workstation machines (230+318 = 548).

 

(a) (b)

Figure 9. Comparison of the IRC Malware Infection related traffic during the inspected time period: (a) in the first 10 hour; (b) in the last 10 hour.

Figure 10 shows the comparison of DNS server related traffic graphs. The two periods remain almost the same. We can deduce from the previous analysis in Figure 7 that the vulnerability exploits at the DNS server will continue, because the current trial rate is one per 15 minute from the compromised workstation machine, only reaching port 25xx at each host by the end of the entire period. This security event will at least last tens of times of the inspected period.

(a) (b)

Figure 10. Comparison of the Domain/DNS server-centric traffic graphs: (a) in the first 10 hour; (b) in the last 10 hour.

 

MC 2.3 What do you suspect is (are) the root cause(s) of the events identified in MC 2.1? Understanding that you cannot shut down the corporate network or disconnect it from the internet, what actions should the network administrators take to mitigate the root cause problem(s)?

The root cause of the security events are probably the workstation systems (172.23.*.*) became comprised due to weak passwords and vulnerabilities on the system and therefore became members of botnets. Malwares running on these workstations continue to scan and exploit security holes of other workstations, servers and external websites. The compromised machines (bots) can also initiate DDoS attacks to the Domain/DNS servers and external websites.

The possible mitigations are:

1) Cut down the command and control (C&C) channels for the botmaster. The enterprise network administrator should block all IRC related traffic involving port 6667.

2) Change the passwords immediately. The first step for many attackers to compromise a system is simply to try to connect with SSH and guess passwords. The administrator should adopt strong password policy for users and set up password expiration and rotation policy.

3) Shutdown the unnecessary services at the regional network, such as FTP , Remote Desktop, IRC, SSH, Database, etc. Reconfigure firewall rules so that for the traffic going to the unused service port will be dropped at the firewall in the first place.

4) Run vulnerability scans to identify vulnerability of systems, then either patch the systems to the latest standard to prevent from security threats. Run virus scanning program to remove the viruses.

5) Secure the servers in the regional network, e.g., configure the DNS server to refuse all DNS update from the illegitimate DNS servers. Use public-key authentication for DNS updates.

While these suggestions may be helpful, there is no panacea to cure all security problems. The proposed Network Security and Anomaly Visualization (NSAV) tool can provide a time-efficient alternative to network operators and administrators, such as those at Bank of Money, to not only detect but more importantly find the root causes of network security anomalies, if they happen.

 

 

 

 

 

APPENDIX

l  Data Processing Details

The data processing takes two steps:

1.      Anomaly parsing

We extract the potential anomalies from the firewall and IDS logs. A common format is defined for all type of anomalies:

<Timestamp>, <Host Machine IP (:port)>, <Anomaly Type>, <Detailed Description>

To parse firewall logs, we take a white list approach. We manually write a good rule set as in Figure 11, according to the manual interpretation of the BoM network operation policies and configurations. This takes us approximately 2 hour in total, including the initial setting and the iterative changes to the rule set.

Figure 11. Firewall good rule set:

 

The resulting firewall anomalies are the traffic not matched by all the rules. Each flow will generate a “_s” anomaly in the source machine and a “_d” anomaly in the destination machine. The firewall anomalies are further partitioned into 5 types, as shown in Figure 12. A sample of the firewall anomalies is given below.

Firewall anomaly example:

1333789124,172.23.254.80,IRC-Malware-Infection_s,172.23.254.80:2275 --> 10.32.5.50:6667

 

For the IDS logs, all the records are kept as anomalies, because IDS already did the filtering process. A sample can be found below. 6 IDS anomaly types are present in the data.

IDS log anomaly sample:

1333737240,172.23.254.80,Misc-activity_d,ET POLICY IRC authorization message 10.32.5.55: 6667-->172.23.254.80:1534

 

Figure 12. Types of network anomalies detected in BoM regional office.

 

2.      Traffic flow graph generation

The traffic flow graphs indicating the live network topology are constructed directly from the firewall NetFlow data, where each source IP address and port number has established connection states with a destination IP and port. For concise purpose, only IP level connections are used as network edges. Time is partitioned by a preset window size, 3600s by default. Each flow will be recorded in consecutive time windows according to their built and tear down timestamps. Eventually one flow graph is generated for each time window for flexibility. During the online visualization, the user can select several consecutive time slots and the corresponding graphs are aggregated on the fly. Note that caching mechanism is applied to speedup the processing.

 

l  Visualization Design

1.      Huge Graph Visualization through Loss-Free Abstraction

All the graph visualizations shown in this submission applies the loss-free graph abstraction method. The basic idea is to group nodes with the same neighbor set together as mega-nodes. The node and edge attributes of the mega-node are aggregated from the underlying original nodes. In most cases, the graph abstraction can reduce the graph complexity (measured by #nodes) by >95%, in this case 99.5%. It is guaranteed that the abstracted graph preserves many critical features of the original graph: connectivity, shortest path, node affinity, and importantly all the connections (flow in the security graph). The graph abstraction algorithm is deterministic, single-pass, and scalable to support graphs of a million nodes. For more details of the graph abstraction method, the reader can refer to the technical report here: CNG Report (unpublished manuscript, all right preserved).

2.      Temporal Anomaly Visualization

In the bottom right panel of the NSAV tool, a temporal visualization is shown to present the distribution of anomalies on the selected hosts. The horizontal axis is the time dimension, the vertical axis is the type of anomalies, one row per anomaly type. Each anomaly is plotted as one icon, with the icon alphabet indicating the anomaly type (Figure 12). The view can be zoomed both vertically and horizontally. A mouse hovering on each icon can show the details of the anomaly.